Code
algeria_economy <- global_economy |>
filter(Country == "Algeria")
algeria_economy |>
autoplot(Exports) +
labs(y = "% of GDP", title = "Exports: Algeria")Developed in the late 1950s by Brown, Holt, and Winters., exponential smoothing methods remain among the most practical forecasting tools.
These methods create forecasts as exponentially weighted averages of past observations, giving more weight to recent data.
Exponential smoothing methods generate forecasts by calculating a weighted average of past observations, where the weights assigned to older data decrease exponentially.
This approach emphasizes recent data more heavily than older data, reflecting the assumption that recent observations are more relevant for predicting future values
Simple Exponential Smoothing (SES): Suitable for series without trends or seasonal patterns, SES assumes a constant local mean over time. It uses all past observations with exponentially decreasing weights for older observations, set by the smoothing parameter \(( \alpha )\)
Holt’s Linear Trend Model: Extends SES to accommodate data with trends by adding a trend component, which is updated over time.
Holt-Winters Seasonal Model: Further extends Holt’s model to include a seasonal component, making it ideal for data with both trends and seasonality.
| Choose Model Based On: | SES | Holt’s | Holt-Winters |
|---|---|---|---|
| Trend? | ❌ No | ✔️ Yes | ✔️ Yes |
| Seasonality? | ❌ No | ❌ No | ✔️ Yes |
| Complexity | Low | Moderate | High |
Best for: Data without clear trends or seasonality (e.g., Algeria exports data below).
🫵 Using the algeria_economy dataset, create a plot to visualize Algeria’s exports as a percentage of GDP over time. Interpret the trend—what patterns or fluctuations do you observe, and what economic factors might explain them?
algeria_economy <- global_economy |>
filter(Country == "Algeria")
algeria_economy |>
autoplot(Exports) +
labs(y = "% of GDP", title = "Exports: Algeria")✔️ High Volatility: Algeria’s exports (% of GDP) show significant fluctuations 📉📈, reflecting economic instability.
🛢️ Oil Dependency: Peaks align with oil booms 💰, while drops coincide with global oil crises and policy shifts ⚠️.
Simple Exponential Smoothing (SES) : The SES model forecasts future values using a weighted average where weights decrease exponentially as observations get older. The weights are determined by the smoothing parameter \(( \alpha )\), which lies between 0 and 1. The forecast equation is:
\(\hat{Y}_{T+1} = \alpha Y_T + (1 - \alpha) \hat{Y}_T\)
📌 Where:
✅ \(( \hat{Y}_{T+1} )\) → Forecast for next period 📈
✅ \(( Y_T )\) → Actual observation at time ( T ) 📊
✅ \(( \hat{Y}_T )\) → Previous forecast value ⏳
✅ \(( \alpha )\) → Smoothing parameter \((0 \leq \alpha \leq 1)\) ⚙️
🔑 Key Insights:
✔️ Simplest model in the exponential smoothing family 🎯
✔️ Assumes a constant level (no trend or seasonality) 📉
✔️ Uses exponentially weighted past observations 🔄
✔️ Smoothing parameter \(( \alpha )\) controls how fast old data is discounted 🎚️
🔮 Forecast for ( T+1 ): Based on the most recent observation with exponentially decaying weights!
Let’s recall and compare :
Naïve Method
Forecasts assume the last observed value is the best predictor for all future periods: \[ \hat{y}_{T+h|T} = y_T \quad \text{for } h = 1, 2, \dots \]
📌 Key Properties:
✅ Weight Allocation: 100% weight to most recent observation \((y_T)\)
✅ Use Case: Stable series with no trend/seasonality
⚠️ Limitation: Ignores all historical data except last value
Average Method
Forecasts use the mean of all historical observations: \[ \hat{y}_{T+h|T} = \frac{1}{T} \sum_{t=1}^{T} y_t \quad \text{for } h = 1, 2, \dots \]
📌 Key Properties:
✅ Weight Allocation: Equal weights \((1/T)\) to all observations
✅ Use Case: Perfectly stationary data with mean-reverting behavior
⚠️ Limitation: Fails to adapt to trends or changing patterns
Simple Exponential Smoothing
Balances naïve and average methods using exponentially decaying weights
Forecasts are weighted averages where weights decay exponentially: \[
\hat{y}_{T+1|T} = \alpha y_T + \alpha(1-\alpha)y_{T-1} + \alpha(1-\alpha)^2 y_{T-2} + \cdots
\] Where:
- \(0 \leq \alpha \leq 1\) (smoothing parameter). The rate at which the weights decrease is controlled by the parameter $\alpha$
- Weights decay geometrically: \(\alpha(1-\alpha)^j\) for \(j=0,1,2,\dots\)
Weight Distribution Examples:
| \(\alpha\) | Weight on \(y_T\) | Weight on \(y_{T-1}\) | Weight on \(y_{T-2}\) | Sum ≈ 1? |
|---|---|---|---|---|
| 0.2 | 0.2 | 0.16 | 0.128 | Yes |
| 0.5 | 0.5 | 0.25 | 0.125 | Yes |
| 0.8 | 0.8 | 0.16 | 0.032 | Yes |
📌 Key Advantages:
✅ Adapts to recent trends better than naïve/average methods
✅ Flexible weighting via \(\alpha\):
- \(\alpha \to 1\) ≈ Naïve method (high reactivity)
- \(\alpha \to 0\) ≈ Average method (strong smoothing)
✅ Computationally efficient
The company has accumulated the demand data in the following table for its computers for the past 12 months.
a.Compute an exponentially smoothed forecast, using an α value of 0.3
b. Compute an exponentially smoothed forecast, using an α value of 0.5
| Month | Orders Delivered per Month |
|---|---|
| January | 37 |
| February | 40 |
| March | 41 |
| April | 37 |
| May | 45 |
| June | 50 |
| July | 43 |
| August | 47 |
| September | 56 |
| October | 52 |
| November | 55 |
| December | 54 |
Solution
Initial value: 37 (assuming it’s the first value in the data)
| Month | \(D_t\) | \(F_t = \alpha(0.30)\) |
|---|---|---|
| January | 37 | - |
| February | 40 | \(0.30 (37) + 0.70 (37) = 37\) |
| March | 41 | \(0.30 (40) + 0.70 (37) = 37.9\) |
| April | 37 | \(0.30 (41) + 0.70 (37.9) = 38.83\) |
| May | 45 | \(0.30(37) + 0.70 (38.83) = 38.28\) |
| June | 50 | \(0.30(45) + 0.70(38.28) = 40.296\) |
| July | 43 | \(0.30(50) + 0.70(40.296) = 43.21\) |
| August | 47 | \(0.30(43) + 0.70(43.21) = 43.15\) |
| September | 56 | \(0.30(47) + 0.70(43.15) = 44.30\) |
| October | 52 | \(0.30(56) + 0.70(44.30) = 47.81\) |
| November | 55 | \(0.30(52) + 0.70(47.81) = 49.06\) |
| December | 54 | \(0.30(55) + 0.70(49.06) = 50.84\) |
| January | - | \(0.30(54) + 0.70 (50.84) = 51.79\) |
b. Compute an exponentially smoothed forecast, using an α value of 0.5
| Month | \(D_t\) | \(F_t = \alpha(0.50)\) |
|---|---|---|
| January | 37 | - |
| February | 40 | \(0.50 (37) + 0.50 (37) = 37\) |
| March | 41 | \(0.50 (40) + 0.50 (37) = 38.5\) |
| April | 37 | \(0.50 (41) + 0.50 (38.5) = 39.75\) |
| May | 45 | \(0.50(37) + 0.50 (39.75) = 38.875\) |
| June | 50 | \(0.50(45) + 0.50(38.875) = 41.9375\) |
| July | 43 | \(0.50(50) + 0.50(41.9375) = 45.96875\) |
| August | 47 | \(0.50(43) + 0.50(45.96875) = 44.484375\) |
| September | 56 | \(0.50(47) + 0.50(44.484375) = 47.7421875\) |
| October | 52 | \(0.50(56) + 0.50(47.7421875) = 51.37109375\) |
| November | 55 | \(0.50(52) + 0.50(51.37109375) = 51.685546875\) |
| December | 54 | \(0.50(55) + 0.50(51.685546875) = 52.8427734375\) |
| January | - | \(0.50(54) + 0.50 (52.8427734375) = 53.42138671875\) |
# Load the forecast package
library(forecast)
library(TSstudio)
# Original time series data
original_series <- c(37, 40, 41, 37, 45, 50, 43, 47, 56, 52, 55, 54)
# Create time series object
ts_data <- ts(original_series)
# Exponential smoothing forecasts with alpha = 0.3
forecast_03 <- forecast(ses(ts_data, alpha = 0.3), h = 1)
# Exponential smoothing forecasts with alpha = 0.5
forecast_05 <- forecast(ses(ts_data, alpha = 0.5), h = 1)forecast_03 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
13 51.8469 44.30635 59.38745 40.31462 63.37917
forecast_05 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
13 53.60739 46.99172 60.22305 43.4896 63.72517
accuracy (forecast_03) ME RMSE MAE MPE MAPE MASE ACF1
Training set 3.047576 5.371262 4.330679 5.50894 8.935072 0.9721932 0.1117925
accuracy(forecast_05) ME RMSE MAE MPE MAPE MASE ACF1
Training set 2.458106 4.712451 3.743754 4.608617 7.902775 0.8404346 -0.1994535
ex <- ses(ts_data, alpha = 0.3, h = 5)ex1 <- ses(ts_data, alpha =0.5, h = 5)In the context of forecasting, exponential smoothing is a technique used to predict future values based on past observations. The smoothing constant α (alpha) plays a critical role in this method. It determines how much weight is given to recent data points versus older ones.
When α is close to 1, the forecast is highly influenced by recent observations, making it sensitive to short-term fluctuations.
When α is close to 0, the forecast relies more on historical data, resulting in a smoother prediction.
Adjusting \(α\) based on demand trends is essential for achieving accurate forecasts. If the demand is stable, a smaller α is preferable. Conversely, if the demand exhibits significant fluctuations, a larger α is recommended.
Remember that choosing the right α value requires a balance between responsiveness to recent changes and stability in the forecast. 📈
Alternative : using plot_forecast
plot_forecast(ex1)Weighted Average Form
Forecasts combine the latest observation \((y_t)\) and prior forecast \((\hat{y}_{t|t-1})\) with smoothing parameter \(\alpha\): \[ \hat{y}_{T+1|T} = \alpha y_T + (1-\alpha)\hat{y}_{T|T-1} \]
The process has to start somewhere, so we let the first fitted value at time 1 be denoted by \(ℓ_0\) (which we will have to estimate).
Recursive Expansion: \[ \begin{align*} \hat{y}_{2|1} &= \alpha y_1 + (1-\alpha)\ell_0 \\ \hat{y}_{3|2} &= \alpha y_2 + \alpha(1-\alpha)y_1 + (1-\alpha)^2\ell_0 \\ &\vdots \\ \hat{y}_{T+1|T} &= \sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j} + (1-\alpha)^T\ell_0 \end{align*} \]
📌 Key Properties:
✅ Exponentially decaying weights: \(\alpha(1-\alpha)^j\)
✅ Initial level \((\ell_0)\) becomes negligible as \(T\) grows
Component Form
We have
\[ \hat{y}_{T+1|T} = \alpha y_T + (1-\alpha)\hat{y}_{T|T-1} \]
Decomposes forecasts into level component \((\ell_t)\):
Forecast Equation:
\[
\hat{y}_{t+h|t} = \ell_t \quad (h \geq 1)
\]
Smoothing Equation:
\[
\ell_t = \alpha y_t + (1-\alpha)\ell_{t-1}
\]
📌 Interpretation:
✅ \(\ell_t\) = Estimated series level at time \(t\)
✅ Identical to weighted average form when \(\ell_t = \hat{y}_{t+1|t}\)
✅ Foundation for advanced models (trend/seasonal components)
Flat Forecasts
All future predictions equal the final level estimate: \[
\hat{y}_{T+h|T} = \ell_T \quad \text{for } h = 2, 3, \dots
\]
📌 When to Use:
✅ No trend/seasonality in data
✅ Short-term forecasts where trend is negligible
⚠️ Limitation: Fails for trending/seasonal series
Parameter Optimization
Parameters \((\alpha, \ell_0)\) are estimated by minimizing SSE: \[ \text{SSE} = \sum_{t=1}^T (y_t - \hat{y}_{t|t-1})^2 \]
| Aspect | Regression | Exponential Smoothing |
|---|---|---|
| Parameter Linearity | Linear in coefficients | Non-linear in \((\alpha, \ell_0)\)non-linear minimisation problem |
| Solution Method | Closed-form (matrix inversion)
|
Numerical optimization (e.g., L-BFGS) |
| Initial Values | Not applicable | Critical for early forecasts |
📌 Implementation Steps:
1. Define SSE function using recursive equations
2. Use optimization algorithms to find \(\alpha \in [0,1]\) and \(\ell_0\)
3. Validate with metrics like AIC or cross-validation
Why Not Just Use the Last Observation?
A flat forecast is not the same as repeating the last observation! SES smooths out noise by weighting all past observations (not just the most recent one). For example:
If the last observation is an outlier (e.g., a sudden spike), SES will dampen its impact using the smoothing parameter \(\alpha \in [0,1]\)
A flat forecast represents a smoothed average, not a naive “copy-paste” of the latest value.
In this example, simple exponential smoothing is applied to forecast exports of goods and services from Algeria.
# Estimate parameters
fit <- algeria_economy |>
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc <- fit |>
forecast(h = 5)✔️ Forecast Plot (2018–2022)
✔️ Fitted Values (1960–2017): One-step-ahead estimates plotted alongside actual data 📈.
✔️ Uncertainty in Forecasts: Future exports have high variability ⚠️.
✔️ Key Warning: Ignoring uncertainty in point forecasts interpretation can be misleading ❗.
Extends simple exponential smoothing to capture trends via level (\(\ell_t\)) and slope (\(b_t\)) components:
Forecast Equation:
\[
\hat{y}_{t+h|t} = \ell_t + h b_t \quad (h = 1, 2, \dots)
\]
Level Equation:
\[
\ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})
\]
Trend Equation:
\[
b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1}
\]
📌 Key Parameters:
- \(\alpha\): Smooths the level (reactivity to new observations)
- \(\beta^*\): Smooths the trend (reactivity to trend changes)
🫵🧪 Exercise: Plotting Population Trends
Using the global_economy dataset, generate a time series plot of Australia’s population (in millions) over time. (Convert the population to millions by dividing by 1,000,000.)
We will apply Holt’s method to this series. ⬇️⬇️
fit <- aus_economy |>
model(
AAN = ETS(Pop ~ error("A") + trend("A") + season("N"))
)
fc <- fit |> forecast(h = 10)report(fit)Series: Pop
Model: ETS(A,A,N)
Smoothing parameters:
alpha = 0.9999
beta = 0.3266366
Initial states:
l[0] b[0]
10.05414 0.2224818
sigma^2: 0.0041
AIC AICc BIC
-76.98569 -75.83184 -66.68347
Smoothing parameters:
α (alpha) = 0.9999:
This is the level smoothing parameter. A value near 1 means the model gives strong weight to the most recent observations when updating the level.
β (beta) = 0.3266:
This is the trend smoothing parameter. A moderate value like this indicates that the trend component is updated, but also retains some memory of the past.
Initial states: ℓ₀ = 10.05414: Initial level of the series.
b₀ = 0.22248: Initial growth (trend) per period.
σ² = 0.0041: Variance of residuals (used for prediction intervals).
Model selection criteria: AIC, AICc, BIC — useful for comparing models (lower is better).
Holt’s linear trend assumes trends continue indefinitely, often leading to over-forecasting for long horizons. The damped trend method introduces a damping parameter \(\phi\) to gradually reduce the trend’s impact.
Forecast Equation:
\[
\hat{y}_{t+h|t} = \ell_t + (\phi + \phi^2 + \dots + \phi^h) b_t
\]
Level Equation:
\[
\ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})
\]
Trend Equation:
\[
b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)\phi b_{t-1}
\]
📌 Parameters:
- \(\alpha\): Smooths the level
- \(\beta^*\): Smooths the trend
- \(\phi \in (0.8, 0.98)\): Dampens trend over time
We apply two ETS models to forecast Australian population:
Holt’s Linear Trend: ETS(A,A,N)
Damped Trend: ETS(A,Ad,N) with damping parameter ϕ = 0.9
| Year | Holt Forecast (Millions) | Damped Forecast (Millions) |
|---|---|---|
| 2018 | 24.97 | 24.97 |
| 2027 | 28.29 | 27.92 |
✅ Both models start at the same point in 2018, but diverge as the forecast horizon increases.
✅ The damped trend model reduces the influence of the trend over time, resulting in slower forecast growth.
✅ This is useful when the historical trend is unlikely to persist indefinitely — a common situation in population forecasts, economic indicators, etc.
Why might damping be more realistic in population modeling?
Now we will run a forecasts for years 2018–2032 generated from Holt’s linear trend method and the damped trend method.
aus_economy |>
model(
`Holt's method` = ETS(Pop ~ error("A") +
trend("A") + season("N")),
`Damped Holt's method` = ETS(Pop ~ error("A") +
trend("Ad", phi = 0.9) + season("N"))
) |>
forecast(h = 15) |>
autoplot(aus_economy, level = NULL) +
labs(title = "Australian population",
y = "Millions")+
guides(colour = guide_legend(title = "Forecast"))We have set the damping parameter to a relatively low number (ϕ=0.90)to exaggerate the effect of damping for comparison. Usually, we would estimate ϕ along with the other parameters. We have also used a rather large forecast horizon (h=15) to highlight the difference between a damped trend and a linear trend.
You are provided with a time series dataset that records the number of users connected to the internet via a server, observed every minute for 100 minutes.
Your goal is to compare the forecasting performance of the three exponential smoothing methods introduced so far:
Simple Exponential Smoothing (SES) — assumes no trend or seasonality
Holt’s Linear Method — incorporates a linear trend
Damped Trend Method — includes a trend that gradually levels off over time
The number of users starts around 85 and ends above 200.
This suggests a positive long-term trend over the 100-minute window.
There are multiple rises and falls, especially around minutes:
15–20 (first peak),
40–50 (second peak),
sharp drop around minute 60,
strong recovery from 70 to 100.
These cycles hint at short-term fluctuations, possibly due to user behavior (e.g., sessions starting/stopping).
While there are repeated ups and downs, they do not follow a fixed interval, so we likely have irregular cycles rather than true seasonality.
stretch_tsibble(.init = 10)
This prepares the data for rolling origin evaluation.
It creates multiple expanding windows starting with the first 10 observations.
Each window grows by 1 data point and is used to re-fit the models — great for simulating real-time forecasting.
model(...)
SES: Simple Exponential Smoothing (no trend or seasonality).
Holt: Holt’s Linear Trend method (additive trend).
Damped: Damped Trend method (trend is additive but damped over time using a damping parameter φ).
forecast(h = 1)
Forecast 1 step ahead from each window.
This simulates a real-time forecasting task, where you always predict just the next time point.
accuracy(www_usage)
Compares the predicted values against the actual observed values from the original www_usage data.
It computes metrics like:
MAE (Mean Absolute Error),
RMSE (Root Mean Squared Error),
MAPE (Mean Absolute Percentage Error),and others.
# A tibble: 3 × 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Damped Test 0.288 3.69 3.00 0.347 2.26 0.663 0.636 0.336
2 Holt Test 0.0610 3.87 3.17 0.244 2.38 0.701 0.668 0.296
3 SES Test 1.46 6.05 4.81 0.904 3.55 1.06 1.04 0.803
Why It’s Not Standard Cross-Validation?
In regular cross-validation (like k-fold), data can be randomly split because order doesn’t matter.
But for time series, order is everything — we can’t train on future and test on past.
✅ Answer:
After evaluating the models using time series cross-validation, Damped Holt’s method consistently produced the lowest MAE and RMSE values compared to Simple Exponential Smoothing and Holt’s linear trend method.
✅ Answer:
Damped Holt’s method includes a damping parameter (φ < 1) that slows down the trend over time, avoiding unrealistic long-term forecasts. This feature makes it more reliable for series with trend but no indefinite growth, like internet usage over a short time window.
✅ Answer:
Now that we’ve identified Damped Holt’s method as the most accurate, we apply it to the entire dataset to generate forecasts for future minutes. This ensures we use all available information for the final forecast.
fit <- www_usage |>
model(
Damped = ETS(value ~ error("A") + trend("Ad") +
season("N"))
)
# Estimated parameters:
tidy(fit)# A tibble: 5 × 3
.model term estimate
<chr> <chr> <dbl>
1 Damped alpha 1.00
2 Damped beta 0.997
3 Damped phi 0.815
4 Damped l[0] 90.4
5 Damped b[0] -0.0173
| Parameter | Value | Meaning |
|---|---|---|
| alpha | 0.9999 | Very high level smoothing — your model gives almost full weight to the most recent observation for updating the level. |
| beta | 0.9966 | Also very high — the trend component is very reactive to recent changes. |
| phi | 0.8150 | The damping parameter — less than 1, so the trend is being damped, i.e., flattened over time to avoid explosive forecasts. |
| l[0] | 90.35 | The estimated initial level of the series at time zero. |
| b[0] | -0.0173 | The estimated initial trend — slightly negative, indicating a very small downward slope at the beginning. |
gg_tsresiduals(fit)| Criterion | Result | Comment |
|---|---|---|
| Residuals ≈ 0-mean | ✅ | No trend detected. |
| No autocorrelation | ⚠️ | Some lags significant → possible underfitting. | | | | | | |
| Normality | ✅ | Acceptably close. |
| Ljung-Box test | ❌ | (From earlier: p-value = 0.0012) → Residuals not white noise. |
augment(fit) |> features(.resid, ljung_box, lag = 10)# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 Damped 29.0 0.00125
Null Hypothesis (H₀): The residuals are uncorrelated (i.e., white noise).
Alternative Hypothesis (H₁): There is autocorrelation in the residuals.
The p-value is 0.00125, which is very small (< 0.05).
❌ So, we reject the null hypothesis: this suggests that the residuals are not white noise.
➡️ The model (Damped Holt) may not have captured all the patterns in the data.
❌ Cross-validation does not guarantee the optimal model — it only tells you which model performs best among the ones you’ve tried.
✔️ Forecast Behavior: Shows a decreasing trend 📉, flattening due to the damping parameter (0.815).
✔️ Prediction Intervals: Wide intervals 📏 reflect historical data variation
✔️ Method Selection:
Easy choice here ✅: Both MSE & MAE favored damped Holt’s method.
In other cases ❓: Different accuracy measures may suggest different methods, requiring judgment.
✔️ Forecasting Insight: No single method is best for all cases 🔄—evaluation should be task-specific & frequent 📊.
Holt-Winters Seasonal Methods
Holt-Winters methods extend Holt’s linear trend to capture seasonality using three components:
- Level (\(\ell_t\))
- Trend (\(b_t\))
- Seasonal (\(s_t\))
Two variants exist:
1. Additive: Seasonal variations are constant over time.
2. Multiplicative: Seasonal variations scale with series level.
Use when seasonal fluctuations are roughly constant (e.g., fixed holiday sales boosts).
Forecast:
\[
\hat{y}_{t+h|t} = \ell_t + h b_t + s_{t+h-m(k+1)}
\]
Level:
\[
\ell_t = \alpha (y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1})
\]
Trend:
\[
b_t = \beta^* (\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1}
\]
Seasonal:
\[
s_t = \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma)s_{t-m}
\]
📌 Parameters:
- \(\alpha\): Smooths level
- \(\beta^*\): Smooths trend
- \(\gamma\): Smooths seasonality (\(0 \leq \gamma \leq 1-\alpha\))
- \(m\): Seasonal period (e.g., 12 for monthly data)
- \(k = \lfloor (h-1)/m \rfloor\): Ensures seasonal indices from the final year
The seasonal component equation is reparameterized to simplify interpretation.
γ in the standard equation is a scaled version of γ∗ , adjusted by the level smoothing parameter α.
This scaling ensures the seasonal parameter γ stays within valid bounds, preventing overfitting and maintaining model stability.
Indepth explanation ⬇️⬇️
Seasonal Component Equation Derivation
Original Seasonal Equation:
The seasonal component is initially expressed as:
\[
s_t = \gamma^* (y_t - \ell_t) + (1 - \gamma^*) s_{t-m}
\]
Substitute the Level Equation:
From the Holt-Winters additive method, the level equation is:
\[
\ell_t = \alpha (y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})
\]
Substitute \(\ell_t\) into the seasonal equation:
\[
s_t = \gamma^* \left[ y_t - \left( \alpha (y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \right) \right] + (1 - \gamma^*) s_{t-m}
\]
Simplify the Expression:
Expand and rearrange terms:
\[
\begin{align*}
s_t &= \gamma^* \left[ y_t - \alpha y_t + \alpha s_{t-m} - (1 - \alpha)(\ell_{t-1} + b_{t-1}) \right] + (1 - \gamma^*) s_{t-m} \\
&= \gamma^* \left[ (1 - \alpha)y_t + \alpha s_{t-m} - (1 - \alpha)(\ell_{t-1} + b_{t-1}) \right] + (1 - \gamma^*) s_{t-m} \\
&= \gamma^*(1 - \alpha)(y_t - \ell_{t-1} - b_{t-1}) + \gamma^* \alpha s_{t-m} + (1 - \gamma^*) s_{t-m} \\
&= \gamma^*(1 - \alpha)(y_t - \ell_{t-1} - b_{t-1}) + \left[ 1 - \gamma^*(1 - \alpha) \right] s_{t-m}
\end{align*}
\]
Reparameterize with \(\gamma = \gamma^*(1 - \alpha)\):
Let \(\gamma = \gamma^*(1 - \alpha)\). Substitute into the equation:
\[
s_t = \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1 - \gamma) s_{t-m}
\]
This matches the standard Holt-Winters seasonal smoothing equation.
Parameter Constraints:
The reparameterization ensures the seasonal smoothing parameter \(\gamma\) scales with \(\alpha\), preventing overfitting and maintaining model stability.
Use when seasonal fluctuations grow with series level (e.g., holiday sales as a % of revenue).
Forecast:
\[
\hat{y}_{t+h|t} = (\ell_t + h b_t) \cdot s_{t+h-m(k+1)}
\]
Level:
\[
\ell_t = \alpha \frac{y_t}{s_{t-m}} + (1-\alpha)(\ell_{t-1} + b_{t-1})
\]
Trend:
\[
b_t = \beta^* (\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1}
\]
Seasonal:
\[
s_t = \gamma \frac{y_t}{\ell_{t-1} + b_{t-1}} + (1-\gamma)s_{t-m}
\]
📌 Key Differences:
- Additive: Seasonal component (\(s_t\)) is absolute (e.g., +100 units each December).
- Multiplicative: Seasonal component (\(s_t\)) is relative (e.g., ×1.2 each December).
We apply Holt-Winters’ method with both additive and multiplicative seasonality. to forecast quarterly visitor nights in Australia spent by domestic tourists.
the plot shows the data from 1998–2017, and the forecasts for 2018–2020. The data show an obvious seasonal pattern, with peaks observed in the March quarter of each year, corresponding to the Australian summer.
Q: Why might we compare both additive and multiplicative ETS models when forecasting Australian domestic holiday trips?
A: We compare both because:
If tourism increases over time and seasonal fluctuations also grow (e.g., peak seasons become more extreme), a multiplicative model may better capture that behavior. But if seasonality and trends stay stable in size, the additive model might be more appropriate.
❓Using the tourism dataset, filter for holiday-purpose trips and aggregate total trips per quarter (in millions). Then, fit both an additive and a multiplicative ETS model, forecast for the next 3 years, and generate a plot comparing both forecasts. What do you observe about the differences in the two models’ forecasts?
fit <- aus_holidays |>
model(
additive = ETS(Trips ~ error("A") + trend("A") +
season("A")),
multiplicative = ETS(Trips ~ error("M") + trend("A") +
season("M"))
)fc <- fit |> forecast(h = "3 years")
fc |>
autoplot(aus_holidays, level = NULL) +
labs(title="Australian domestic tourism",
y="Overnight trips (millions)") +
guides(colour = guide_legend(title = "Forecast"))report(fit)# A tibble: 2 × 9
.model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 additive 0.193 -105. 229. 231. 250. 0.174 0.184 0.321
2 multiplicative 0.00212 -104. 227. 229. 248. 0.170 0.183 0.0328
Damping is possible with both additive and multiplicative Holt-Winters’ methods. A method that often provides accurate and robust forecasts for seasonal data is the Holt-Winters method with a damped trend and multiplicative seasonality:
Forecast:
\[
\hat{y}_{t+h|t} = \left[ \ell_t + (\phi + \phi^2 + \dots + \phi^h) b_t \right] \cdot s_{t+h-m(k+1)}
\]
Level:
\[
\ell_t = \alpha \left( \frac{y_t}{s_{t-m}} \right) + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})
\]
Trend:
\[
b_t = \beta^* (\ell_t - \ell_{t-1}) + (1-\beta^*)\phi b_{t-1}
\]
Seasonal:
\[
s_t = \gamma \left( \frac{y_t}{\ell_{t-1} + \phi b_{t-1}} \right) + (1-\gamma)s_{t-m}
\]
📌 Parameters:
- \(\phi \in [0.8, 0.98]\): Dampens trend over time
- \(\alpha\): Level smoothing (reactivity to new data)
- \(\beta^*\): Trend smoothing
- \(\gamma\): Seasonal smoothing
- \(m\): Seasonal period (e.g., 7 for daily data with weekly seasonality)
The Holt-Winters method can also be used for daily type of data, where the seasonal period is m=7, and the appropriate unit of time for h is in days. Here we forecast pedestrian traffic at a busy Melbourne train station in July 2016.
🧠 Task for You:
Using the pedestrian dataset, create a plot showing the 2-week forecast of daily pedestrian counts (in thousands) at Southern Cross Station, using the ETS(M, Ad, M) model.
Make sure to:
Filter data from July 2016
Use index_by() and summarise() to get daily totals in thousands
Forecast 2 weeks ahead
Plot actual and forecasted values up to August 14, 2016
Title your plot and label the y-axis as: "Pedestrians ('000)"
1. pedestrian |> filter(...)
We filter the pedestrian dataset to include:
Only data from Southern Cross Station
Only data from July 1, 2016 onward
2. index_by(Date)
This groups the data by day (i.e., aggregates hourly data into daily totals).
Think of this like creating a time series where each point represents a whole day.
summarise(Count = sum(Count)/1000)
This creates a new column Count by:
Summing all hourly pedestrian counts for each day
Dividing by 1000 to express the total in thousands of people
(makes the y-axis easier to read)
To fit the model, we use only July 2016 data (training window).
model(...)
We fit an ETS model with the following components:
error("M"): Multiplicative error (used when variation grows with level)
trend("Ad"): Additive damped trend (growth slows down over time)
season("M"): Multiplicative seasonality (seasonal variation scales with the level)
This model is commonly referred to as Holt-Winters multiplicative with damped trend.
|> forecast(h = "2 weeks")
We generate a 2-week ahead forecast (i.e., 14 daily points into August).
We plot:
Actual observed values up to August 14, 2016
Forecasted values from August 1 to 14
The y-axis is labeled in thousands of pedestrians
Title is descriptive
sth_cross_ped <- pedestrian |>
filter(Date >= "2016-07-01",
Sensor == "Southern Cross Station") |>
index_by(Date) |>
summarise(Count = sum(Count)/1000)
sth_cross_ped |>
filter(Date <= "2016-07-31") |>
model(
hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
) |>
forecast(h = "2 weeks") |>
autoplot(sth_cross_ped |> filter(Date <= "2016-08-14")) +
labs(title = "Daily traffic: Southern Cross",
y="Pedestrians ('000)")Forecasts of daily pedestrian traffic at the Southern Cross railway station, Melbourne.
Clearly the model has identified the weekly seasonal pattern and the increasing trend at the end of the data, and the forecasts are a close match to the test data.
Why Use This Method?
Avoids Over-Forecasting: Damped trend (ϕ) prevents unrealistic long-term growth.
Handles Scaling Seasonality: Multiplicative seasonality adapts to patterns proportional to series level.
Flexible for Daily Data: Works with non-traditional seasonal periods (e.g., m=7 for daily weekly cycles).
Exponential smoothing methods are classified by trend (T) and seasonal (S) components. Each method is labeled as (T,S), where: - Trend (T): N (None), A (Additive), A_d (Additive Damped)
- Seasonal (S): N (None), A (Additive), M (Multiplicative)
This creates 9 possible combinations (Table 8.5). Below, we outline each method’s equations, use cases, and key properties.
| Trend (T) | Seasonal (S) | Method Name | Key Equations (Simplified) | Use Case Example |
|---|---|---|---|---|
| N | N | Simple Exponential Smoothing |
\(\hat{y}_{t+h\|t} = \ell_t\) \(\ell_t = \alpha y_t + (1-\alpha)\ell_{t-1}\) |
Stationary data (no trend/seasonality) |
| N | A | Seasonal Naïve |
\(\hat{y}_{t+h\|t} = s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t - \ell_{t-1}) + (1-\gamma)s_{t-m}\) |
Constant seasonal patterns |
| N | M | Multiplicative Seasonal |
\(\hat{y}_{t+h\|t} = \ell_t \cdot s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t / \ell_{t-1}) + (1-\gamma)s_{t-m}\) |
Seasonal patterns scaling with level |
| A | N | Holt’s Linear Trend | \(\hat{y}_{t+h\|t} = \ell_t + h b_t\) \(\ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})\) \(b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1}\) | Trending data (no seasonality) |
| A | A | Additive Holt-Winters | \(\hat{y}_{t+h\|t} = \ell_t + h b_t + s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma)s_{t-m}\) | Additive trend + fixed seasonality |
| A | M | Multiplicative Holt-Winters | \(\hat{y}_{t+h\|t} = (\ell_t + h b_t) \cdot s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t / (\ell_{t-1} + b_{t-1})) + (1-\gamma)s_{t-m}\) | Trending data + scaling seasonality |
| A_d | N | Damped Trend |
\(\hat{y}_{t+h\|t} = \ell_t + \phi_h b_t\) \(\ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})\) \(b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)\phi b_{t-1}\) |
Long-term forecasts with fading trend |
| A_d | A | Damped Additive Holt-Winters |
\(\hat{y}_{t+h\|t} = \ell_t + \phi_h b_t + s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t - \ell_{t-1} - \phi b_{t-1}) + (1-\gamma)s_{t-m}\) |
Damped trend + fixed seasonality |
| A_d | M | Damped Multiplicative HW | \(\hat{y}_{t+h\|t} = [\ell_t + \phi_h b_t] \cdot s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t / (\ell_{t-1} + \phi b_{t-1})) + (1-\gamma)s_{t-m}\) | Damped trend + scaling seasonality |
For all methods, the h-step forecast and smoothing equations depend on the components:
|
Trend Component ↓ Seasonal Component → |
N (None) | A (Additive) | M (Multiplicative) |
|---|---|---|---|
| N (None) | (N,N) |
(N,A) |
(N,M) |
| A (Additive) | (A,N) |
(A,A) |
(A,M) |
| Aₐ (Additive Damped) | (Ad,N) |
(Ad,A) |
(Ad,M) |
| Data Characteristics | Recommended Method |
|---|---|
| No trend/seasonality |
(N,N) Simple Exponential Smoothing |
| Additive trend only |
(A,N) Holt’s Linear Trend |
| Additive trend + fixed seasonality |
(A,A) Additive Holt-Winters |
| Damped trend + scaling seasonality |
(A_d,M) Damped Multiplicative HW |
Forecasting 😂😂
Series: Trips
Model: ETS(M,N,A)
Smoothing parameters:
alpha = 0.3484054
gamma = 0.0001000018
Initial states:
l[0] s[0] s[-1] s[-2] s[-3]
9.727072 -0.5376106 -0.6884343 -0.2933663 1.519411
sigma^2: 0.0022
AIC AICc BIC
226.2289 227.7845 242.9031
Smoothing parameters:
alpha = 0.348: controls how much the level is updated. A value of roughly 0.35 indicates that the level is moderately adjusted based on the most recent observation.
mma = 0.0001: very low seasonal smoothing → seasonality is almost fixed
Initial states:
Level (l[0]) and seasonal components (s[0], s[-1], …) are initialized
sigma² = 0.0022: model variance (lower is better)
AIC / BIC values: for model comparison (lower = better)
ETS does automatically select the optimal structure based on your data, but understanding the components and parameters is essential for interpreting your forecasts and verifying that the chosen model makes sense in the context of your series.
A work by Dr. BENGHALEM Abdelhadi
abdelhadi.benghalem@gmail.com